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The 1948 Selective Service Act established a process 
whereby all United States (US) military applicants take an 
aptitude test to measure their suitability for military job 
specialties. The latest version of these tests, the Armed 
Services Vocational Aptitude Battery (ASVAB) , was introduced 
in 1968. Approximately 900,000 High School students from 
14,000 US High Schools take the ASVAB test each year. This 
"paper and pencil" test requires the applicant to answer 
multiple choice questions (items) on a printed form. The 
creation of paper and pencil forms in one of the ten test 
topics is called form assembly. Form assembly consists of 
picking 20 to 35 items from an item pool of about 300 items 
such that; 1) each item appears on at most one form; 2) each 
form's result represents the applicant's capability; and 3) 
each form has the same level of difficulty. The thesis 
models the creation of paper and pencil forms as a mixed 
integer linear goal program and solves the problem both 
optimally and heuristically . Computational results for seven 
ASVAB-Tests show both methods help improve the form assembly 
process . 
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EXECUTIVE SUMMARY 



The 1948 Selective Service Act established a 
process whereby all United States military applicants take 
an aptitude test to measure their suitability for military 
job specialties. The latest version of these tests, the 
Armed Services Vocational Aptitude Battery (ASVAB) , was 
introduced in 1968. Approximately 900,000 High School 
students from 14,000 US High Schools take the ASVAB test 
each year. This "paper and pencil" test requires the 
applicant to answer multiple choice questions (items) on a 
printed form. The Defense Manpower Data Center, as an 
executive agency for the ASVAB, is responsible for the 
design, development and creation of the tests. The creation 
of paper and pencil forms in one of the ten test topics is 
called form assembly. Form assembly consists of picking 20 
to 35 items from an item pool of about 300 items such that: 
1) each item appears on at most one form; 2) each form's 
result represents the applicant's capability; and 3) each 
form has the same level of difficulty. This thesis models 
the creation of paper and pencil forms as a mixed integer 
linear goal program. One approach solves the program using 
commercially available optimization software. A second ap- 
proach uses a local search with random restart heuristic. 
Both approaches yield good solutions. Computational results 
for the seven ASVAB-Tests show that combining both methods 
can improve the form assembly process . The Defense Manpower 
Data Center benefits from these computational results. 
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I. INTRODUCTION 



The 1948 Selective Service Act established a process 
whereby all United States (US) military applicants take an 
aptitude test to measure their suitability for military job 
specialties. The latest version of these tests, the Armed 
Services Vocational Aptitude Battery (ASVAB) , was introduced 
in 1968. A US Air Force Human Resources Laboratory study in 
1973 calculated cost avoidance from these tests at $76.8 
million per year for enlisted technical training [US Air 
Force Human Resources Laboratory 1973] . 

The ASVAB is currently given in about 14,000 US High 
Schools to about 900,000 potential applicants each year 
[Defense Manpower Data Center 1992] . This "paper and pencil" 
test requires the applicant to answer multiple choice 
questions (items) . Each question has one correct answer that 
must be selected, on average, from a total of four choices. 
The ASVAB test consists of ten different areas of expertise. 
The categories — which have between 20 and 35 specific 
items each — are Arithmetic Reasoning (AR) , Auto and Shop 
(AS) , Coding Speed (CS) , Electronics Information (El) , Ge- 
neral Science (GS) , Mechanical Comprehension (MC) , Mathe- 
matical Knowledge (MK) , Numerical Operations (NO) , Paragraph 
Comprehension (PC) , and Word Knowledge (WK) . 

The model developed in this thesis addresses only seven 
of the ten tests . The seven tests selected for use in the 
model's development are selected because they are similarly 
structured. That is, these seven tests are configured in a 
manner which makes the choice of the next eligible item 
independent of the item chosen before. In other words, there 
is no dependency among items from the perspective of the 
form assembly process. 
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The creation of paper and pencil forms for each cate- 
gory is called "form assembly." Multiple forms must be 
created in each category so that all applicants are not 
tested using the same form. "Form assembly" consists of 
picking 20 to 35 items from a pool of about 300 items such 
that: 1) each item appears on at most one form; 2) each 

form's result represents the applicant's capability; and 3) 
each form has the same level of difficulty. The item pool 
itself can be split into several item groups, where each 
group, called a taxonomy, requires a certain number of items 
per form. 

This thesis models the creation of paper and pencil 
forms as a mixed integer linear goal program and solves the 
problem both optimally and heuristically . 

A. TEST THEORY BACKGROUND 

The measurement of a person's ability or skill level 
(denoted 0) is commonly discretized into 100 intervals, so 
that each level can be expressed as a percentage. These 
intervals are then called percentiles of the ability. The 
skill level distribution over the potential applicant po- 
pulation is approximately normal allowing percentiles to be 
ranked from -3o to +3o around a mean. A reasonable 
assumption is that the probability p of answering an item 
correctly increases as the percentile increases with p ap- 
proaching 1 as the percentile goes to +3o. Hence, this pro- 
bability can be represented by a logistic function, referred 
to as an item response curve. A common model [Lord 1980] 
uses a three-parameter logistic function like the one 
adapted from Lord and Novick [1968] (Figure 1) with 



2 



p(0) 



c + 



1 - c 

1 + g-1.7-a(©-b) • 

Parameter a is a proportionality factor for the slope 
at the inflection point. It represents the discriminating 
power; in other words, how capable an item is to distinguish 
between applicants. Figure 2 shows an example where item 1 
has a steeper curve in the percentile range (50,60), than 
item 2 and therefore provides greater discrimination between 
individuals at percentiles 50 and 60. 




Figure 1: Parameters of the Logistic Function. 

The logistic function represents the probability of answering an 
item correctly and is defined with parameters (a, b and c) . Parameter a 
is proportional to the slope at the inflection point: slope = .425a(l- 
c) . Parameter b indicates an item's difficulty level by defining the po- 
sition of an item's curve along the ability scale 0. Parameter c indi- 
cates the guessing parameter [Lord 1980] . 

Parameter b indicates an item' s difficulty level by 
defining the position of an item' s curve along the ability 
scale 6 (i.e., when the percentile ©i corresponding to the 

probability of a correct answer is 0.5) . 

Parameter c indicates the guessing parameter or the 
probability of answering an item correctly given an ability 
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Parameter c indicates the guessing parameter or the 
probability of answering an item correctly given an ability 
falling greater than 3o below the mean [Lord 1980] . This 
guessing parameter does not necessarily reflect the pro- 
bability to select one correct answer from a certain number 
of possible choices . 




Figure 2: Example of the Discriminating Power. 

Figure 2 provides an example of the discriminating power of two items 
for two applicants with percentiles 50 and 60. Item 1 has a steeper 
curve in the percentile range (50,60) than item 2 and therefore provides 
greater discrimination between individuals at percentiles 50 and 60. 

In practice, 1,000 to 10,000 applicants pretest an item 
and the parameters a, b and c are estimated from the re- 
sults. From the item response curve, an item information 
curve is determined (Figure 3) . The item information curve 
describes the potential information contribution of an item 
to a test form at each percentile. These item information 
curves comprise the bulk of the data for this thesis . 

These item information curves are independent and ad- 
ditive when it is assumed that the information contribution 
of an item to the whole form does not depend on other items 
included on the form [Lord 1980] . Therefore all of a form's 
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item information curves can be added to get an overall in- 
formation curve. This overall information curve is commonly 
denoted as the precision of the form. 




Figure 3: Item Information Curves. 

Figure 3 displays examples of different item information curves. These 
curves describe the potential information contribution of an item to a 
test form at each percentile. 

Empirical research and testing has produced a "re- 
ference curve" for each test representing the desired 
information distribution over a form's percentiles. Since 
the establishment of a standard reference curve in 1980, 
some item pools have changed and it is now possible to 
provide forms with "better" information curves than the 
reference curve. In such cases, these curves are the new 
desired information distribution but cannot be called re- 
ference curves for historic purity. Regardless, in this 
thesis, we refer to the preferred curve as the "goal curve." 



5 



I 

I 

} 



I 

i 

I 

I 



I 

I 

I 

I 



B. OUTLINE 



Chapter II provides information about research related 
to this thesis. Chapter III formulates the form assembly 
process as a mixed integer linear goal programming problem 
and discusses a heuristic to solve it. Chapter IV provides 
results obtained from solving the formulation using a 
heuristic and the General Algebraic Modeling System (GAMS) 
[Brooke, Kendrick and Meeraus 1992] with the solver OSL 
[GAMS 1995] . Chapter V compares the two solution methods and 
presents conclusions. 
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II. RELATED RESEARCH 



The bulk of the literature on aptitude and ability- 
tests involves the concept of item validity [Lord 1980] . 
Validity in this case is taken to be the extent to which a 
test score actually predicts future performance. Toquarn, 
Corpe and Dunette [1991] review more than 10,000 articles 
related to validity as it pertains to ability tests. Their 
literature review highlights the significant effort 
associated with this issue. As pertains specifically to the 
ASVAB, Maier and Taruss [1985] give an example of that test's 
predictability. In this study, the authors demonstrate that 
performance on the ASVAB tests is statistically related to 
training outcome measures of various US Marine Corps 
technical schools. 

The present study uses data provided by the Defense 
Manpower Data Center (DMDC) . Again, as explained on page 
four, these data consist of roughly 300 item information 
curves, each curve derived by standard statistical pro- 
cedures [Lord and Novick 1968] from item response curves. 
These data are assumed to be representative with respect to 
the validity issue. Accordingly, the DMDC data used in the 
present study are used simply to demonstrate a methodo- 
logical approach to "fo3rm assembly." They are not being used 
to demonstrate their predictive validity. 

Unlike the validity literature, there exist only a few 
publications addressing assembly or construction of ability 
or aptitude tests. Berger, Gupta and Berger [1988] present 
the construction of Form P for the Air Force Officer 
Qualifying Test (AFOQT) . They develop two forms of the test 
by adding new items to an old form. The objective is to 
construct two new forms which are equivalent and parallel to 
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the original form. "Equivalence" means that each form has 
the same information content. "Parallel" means that the 
outcome of the test is independent of the form the applicant 
has taken. Their approach is heuristic . The heuristic is 
straight forward. They select items with the most discrimi- 
nating power from the old form; check them against new 
items; and replace old items with new items that provide the 
best match; that is, a match which produces the smallest in- 
formation differences between the old and the new form. 

Baker and Wall [1996] use a form assembly similar to 
the heuristic approach presented in this thesis. They focus 
on a statistical analysis of the Interest Finder Test, a 
test to help students explore their occupational and career 
interests [DMDC 1992] . They describe form assembly as con- 
sisting of two stages. The first stage screens the item pool 
and the second stage uses a heuristic algorithm to assign 
items to the form. Their heuristic selects an initial group 
of items and exchanges items when replacement considerations 
improve the form. The objective function is a weighted 
function that minimizes statistical differences between the 
current form and a desired form. These statistical dif- 
ferences are essentially the mean and standard deviation of 
scaling parameters for the test. The actual criteria for the 
initial item selection and results with respect to form 
assembly are beyond the scope of this paper. 

In summary, the literature review did not reveal prior 
attempts to use optimization in form assembly and only pro- 
vided scant references to the use of heuristic approaches. 
The next chapter discusses the optimization and heuristic 
approaches . 
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III. OPTIMIZATION MODEL AND HEURISTIC 



A. OPTIMIZATION MODEL 

The form assembly problem can be formulated as a mixed 
integer linear goal programming problem (see Charnes and 
Cooper [1961] for a discussion of goal programming) con- 
sisting of two goals. One goal is to assemble forms so each 
form's information curve is as close as possible to the goal 
curve. The second goal is to make each form's information 
curve as "parallel" as possible to one another. The 
"parallel" goal seeks an exam, where results are independent 
of the form the applicant has taken. An exam with all forms 
exactly matching the goal curve would simultaneously satisfy 
both goals but this is typically not possible. The parallel 
goal therefore encourages each form to be close to the goal 
curve. 

We implement the first goal by allowing the deviation 
from the reference curve to vary in groups where deviation 
within the group has the same penalty per unit and groups 
closer to the goal curve have a smaller penalty per unit. 
Figure 4 provides an example of the penalty groups . Any 
vertical deviation between the goal curve and form curve is 
penalized. 
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Figure 4: Penalty Groups. 

This figure displays at percentile 68 how deviation from the goal curve 
can be measured in different groups. The vertical distance Al would be 
penalized per unit with the penalty for group 1 for those units of Al 
within group 1 and with the penalty per unit for group 2 for those units 
of Al within group 2. Since it is desired to be as close to the goal 
curve as possible, group I's penalty per unit would be less than group 
2's penalty per unit. 



The formulation follows. 

Indices ; 

i : item from the item pool; 
p : percentile (ability level); 
f : form to be assembled (1,2,..,F); 
t : taxonomy (1,2, . . , T) ; and 
g ; penalty group. 

Data; 

CATg the maximum deviation between a form and 

the goal curve in group g,- 

INFip information value of item i at percentile p; 
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NITEMt 

PARAWEI 

PENALTYg 

SHAPEp 



the required number of items in taxonomy t; 
weight that combines the two goals; 
penalty per unit deviation within group g; 
and 

the information value for the goal curve at 
percentile p. 



Xif 

PYpfg 



nypfg 



DelpluSf 

Delnegf 



1, if item i is used on form f; 

deviation above the desired shape in group g 

at percentile p on form f; 

deviation below the desired shape in group g 
at percentile p on form f; 

the total information form 1 contains that 
exceeds form f; and 

the total information form f contains that 
exceeds form l . 



Formulation; 



niin Z Z Z penalty^ • (pyp,^ + ny^,^) 

p f g 

+ PARAWEI • ^ (DelpluSf + Delneg^) 

f >1 



Zpypfg 


> Z^NFip • Xif - SHAPEp 


Vp, f 


(2) 


g 


i 






Z^Yptg 

g 


> - Z^NFip • Xi, + SHAPEp 

i 


Vp, f 


(3) 


Z^if = 


NITEMp 


Vf , t 


(4) 



i 
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Vi 



(5) 






f 



X Y, INFip ■ X,. - X Z 



Vf>l (6) 




= DelpluSf - Delneg^ 



0 ^ PYpfg ^ CATg 



Vp,f,g (7) 



0 < nypfg < CATg 



Vp,f,g (8) 



Xif binary 



Vi,f 



DelpluSf, Delnegf > 0 



Vf . 



The first component of the objective function, 
z S Z PENALTYg • (py^.g + ny^.g) , 



p f g 



minimizes the vertical distances (weighted deviation) bet- 
ween the goal curve and the assembled forms . The second 
coiiponent , 



encourages forms to have the same information. A second 
component having value zero does not necessarily imply 
parallel forms since the vertical distances at percentile p 
from form 1 to form f can have positive or negative signs 
depending on whether form f is above or below form 1 . These 
positive and negative distances can sum up to zero producing 
two forms where DelpluSf = Delnegf = 0. Nevertheless, the 
second component has empirically produced parallel forms and 
requires only F-1 additional constraints. Constraints (2) 
and (3) determine the positive and negative deviation at 
each percentile between the assembled forms and the goal 
curve. Constraint (4) ensures the required number of items 



PARAWEI • ^ (DelpluSf + Delnegf) 



f 
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per taxonomy is satisfied. Constraint (5) ensures that each 
item is used at most once. Constraint (6) determines the 
total information difference between form l and other forms. 
Constraints (7) and (8) bound the positive and negative de- 
viations . 

B . HEURISTIC APPROACH 

Solving the previous problem optimally has taken 
extensive computation time as shown in the next chapter. To 
provide solutions quickly a local search with random restart 
heuristic (e.g., [Papadimitriou and Steiglitz 1982]) is de- 
veloped. 

The main objectives for the heuristic are to quickly 
complete one assembly and to quickly evaluate small 
variations to the assembly. The heuristic uses only integer 
arithmetic within efficient code to help improve per- 
formance . 

The heuristic starts by dividing the item pool into 
arrays of items where each array corresponds to a taxonomy. 
These sub-item pools are eligible sets (ESt) for each 
taxonomy . 

Each form consists of vectors for each taxonomy 
(Assigntf) . The algorithm consists of three main procedures 
(Figure 5) : fill_initial_form; do_swap; and improve_pa- 
rallel . 

Figure 6 displays the pseudocode for the procedure 
fill_initial_forms. A random number generator [Lewis, 
Goodmann and Miller 1969] is used to assemble the initial 
forms subject to all constraints. 
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CO 
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do_swap 


0> 






o 


► 


improvejparallel 



Until sentinel 



Figure 5: Main Procedures of the Heuristic. 

This figure shows the main procedures for the heuristic algorithm. A 
loop over one assembly of all forms runs as often as the user has 
chosen. The best assembly is the result. 



1 Assigiitf ■<- 0 ; initialize ESt (assume | ESt | ^F*NITEMt} 

2 for f = 1 to F 

3 for t = 1 to T 

4 while lAssigntfl < NITEMt 

5 randomly select item i from ESt 

6 Assigutf <— Assigntf ^ {i} 

7 ESt <- ESt - {i} 

8 end 

9 end 
10 end 

Figure 6: The Pseudocode for the Procedure fill_initial_forrns. 

This figure shows how the heuristic randomly assembles the initial 
forms. The indices and variables match those from the optimization 
model. Assign^-f contains items on form f in taxonomy t. ESt contains all 
items in taxonomy t not currently used on any form. 

The procedure dojswap defines a swap as the exchange of 
an item from a form (iout ^ Assigntf) with an item from the 
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appropriate eligible set (i^n e ESt) . Figure 7 shows the 
pseudocode for this procedure. 



1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 



improve <— 1 
while improve > 0 
improve <- 0 
for t = 1 to T 
for f = 1 to F 

for each item i^^t ^ Assigritf 
sofar <— ObjFctValue_old 
Assigritf <- Assigritf - {iout} 
for each item (ifn) e ESt 
Assigritf Assigritf + {ii„} 
calculate Obj FctVal_new 
if Ob j FctVal_neWf < sofar 
sofar <- Ob j FctVal_new 
candidate = ifn 
end if 

Assigntf Assigntf - {im} 
end 

if sofar < Ob j FctValue_old 
swap candidate with i^ut 
update involved curves 
improve <- improve +1 
end if 
end 

end 

end 

end while 



( improvement ) 



Figure 7: The Pseudocode for the Procedure do_swap. 

This figure shows how items swapping improves forms. ObjFctValue_old is 
the sura of all deviation between form f and the goal curve before 
potentially swapping an item and ObjFct_new is after a potential swap. 
The procedure repeats until no swap yields a decrease to the objective 
function of any form. 



The objective function value measuring the effectiveness of 
the swap is the sum of all deviations between form f and the 
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goal curve. Improvement, as it is used in this context means 
a decrease of the objective function value, caused by swap- 
ping an item. This procedure runs through all forms and 
eligible sets and checks whether a swap yields improvement . 
The while-loop repeats as long as at least one improvement 
is found across all forms and eligible sets. 

To increase the speed of the algorithm a baseline for 
checking the swaps is used. A baseline in this context is 
the sum of all item information curves currently assembled 
without the item considered for exchange (iout) • Within the 
pseudocode of Figure 7, the baseline can be calculated after 
step 8; and doing so reduces the computational effort needed 
to determine the new objective function value in step 11. 
Only the 100 information values of item i^j, have to be added 
to the baseline instead of summing over all items currently 
assigned. The swap is executed after all items of the 
eligible set have been examined with that item that gives 
the most improvement (candidate) . 

The procedure improve _parallel checks if swapping items 
between forms can improve the forms . The procedure starts by 
finding the form with the smallest sum of all deviations 
from the goal curve sofar. This best form is the one with 
which the other forms have to be aligned. Figure 8 displays 
the pseudocode for the procedure improve_parallel . At this 
stadium, the heuristic does not allow the objective function 
to increase. 

An improving swap between forms happens only after all 
items within a taxonomy on all forms have been compared with 
an item on the best form. The calculation of the curves uses 
the baseline principle again. Improve jparallel terminates 
when no item is swapped on any form. 
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1 improve <- 1 

2 while improve > 0 

3 improve <— 0 

4 find best form f_best 

5 for t = 1 to T 

6 for each item i^ut e Assigntf_best 

7 Assigntf_best Assigntf_best - {iout} 

8 for f = 1 to F excluding f_best 

9 sofar (ObjFctValuef + ObjFctValuef_best) oid 

10 for each item i^^ e Assign^f 

11 Assigntf_best Assigntf_ best ■*’ {^in} 

12 Assigntf <- Assigntf - {im} + {i^ut} 

13 calculate ObjFctValues 

14 better? <- (ObjFctValuef +ObjFctValuef_best); 

15 if better? < sofar then improvement 

16 sofar <- better? 

17 candidatein = i^n 

18 candidateout = i^ut 

19 end if 

20 Assigntf Assigntf + {ifn} - {ioutl 

21 Assigntf_best Assigntf_ best { lin} 

22 end 

23 end 

24 if sofar < (ObjFctValuef +ObjFctValuef best) old 

25 swap candidates 

26 update involved curves 

27 irrprove <- improve +1 

28 end if 

29 end 

30 end 

31 end while 



Figure 8: The Pseudocode for the Procedure improve^parallel . 

This figure shows swaps allowed between forms. A swap, given it improves 
the objective function value, occurs after one item on the best form has 
been compared with all other assigned items on the other forms. 
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IV. COMPUTATIONAL RESULTS 



The task is to assemble forms for seven different 
tests; Arithmetic Reasoning (AR) , Auto and Shop (AS), 
Electronics Infoirmation (El) , General Science (GS) , 

Mechanical Comprehension (MC) , Mathematical Knowledge (MK) , 
and Word Knowledge (WK) . Table 1 lists the test speci- 
fications . 



Test 


Item Pool size 


Forms 

needed 


Items on form 


Taxonomies 


AR 


338 


2 


30 


5 


AS 


196 


2 


25 


2 


El 


190 


2 


20 


4 


GS 


313 


2 


25 


12 


MC 


296 


4 


25 


6 


MK 


327 


4 


25 


5 


WK 


276 


2 


35 


2 



Table l: Test Requirements and Item Pools. 

This table lists the specifications for each of the tests. For example, 
the AR-Test requires the creation of two forms each having 30 items. The 
30 items, falling into five taxonomies, must be selected from an item 
pool of 338 items. 

A. OPTIMIZATION PARAMETER SETTINGS 

The optimization model formulated in the previous 
chapter requires the specification of a number of para- 
meters. A summary sheet for each test contains results as 
well as parameter settings. We use the AR-Test as an 
example . 
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Figure 9 shows the implemented objective function. All 
values were empirically developed. The penalties for the 
unbounded variables, py4 and ny4, are 100. Other values are: 
CATi = 0.01; CATj = 0.05; CAT 3 = 0.10; 

penal tyi = O.OOOOl; penaltyj = 1.00; penaltyj = 5.00; and 

PARAWEI = 25. 



f p 


100 • ny4pf 


+ 0 . 00001 • pyipf + 


1 * Py 2 pf + 5 • py3pf 


+ 0 . 00001 • nylpf + 


1 • ny2pf + 5 • ny3pf) 


+ 25 * (Delplus 

f 


f - Delneg^) 



Figure 9: The objective function parameters for the optimization model. 
This figure shows the objective function implemented in GAMS for the AR- 
Test . It measures the overall distance between the forms and the goal 
curve at each percentile. The pys and nys are the deviation variables. 
25 * l(Delplus - Delneg) is the subgoal to encourage parallel forms. 

We use only upper bounds on the deviation variables 
(CATg) for groups 1, 2 and 3. The following pages display 

for each test the bounds for the penalty groups and the 
weights for the subgoal . 

B. OPTIMIZATION RESULTS 

This section shows results for the assembled tests. The 
integrality gap provided is the difference between the best 
integer solution identified and a lower bound on the 
solution, expressed as a percentage of the lower bound. The 
results for all tests are presented in alphabetical order. 
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Table 2 summarizes the numerical results obtained. Figures 
10 to 16 show graphical results. 



Test 


objfctvalue 
lower bound (O) 


objfctval best 
solution (®) 


integrality gap 
(%) (®) 


runt ime 
(seconds) 


AR 


865.97 


932.33 


7.6 


15,260 


AS 


2,788.00 


2,862.73 


2.7 


215 


El 


9,489.65 


9,561.94 


1.0 


17 


GS 


8,095.66 


8,433.03 


4.2 


312 


MC 


125.04 


1,187.11 


850.0 


50,000 


MK 


2,006.71 


7,278.24 


260.0 


50,000 


WK 


3,588.31 


5,188.42 


39.2 


13,934 



Table 2: Numerical Results of the Optimization Assembly. 

Table 2 summarizes all numerical results for tests assembled using 
optimization, where objfctvalue = Objective Function Value. The inte- 
grality gap provided is the difference between the best integer solution 
identified and a lower bound on the solution, expressed as a percentage 
of the lower bound (e.g., ®=(©- 0 )/ 0 ). 



Model results come from an IBM RS6000 Model 590 
workstation using GAMS and the OSL solver. The model size 
varies, primarily according to the number of forms and the 
cardinality of the item pool. The approximate size of the 
largest model, MK-Test, is shown below: 

number of constraints: 1,150; 

number of continuous variables: 4,500; 

number of binary variables: 1,300; and 

number of non- zero elements: 250,000. 
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AR - Test (Arithmetic Reasoning) ; 



General Requirements : 



forms: 2; 

items: 30 each; and 

taxonomies: 5 (7, 8, 5, 5, 5 items in taxonomy 1 to 5). 
Settings : 

CAT-values: O.Ol, 0.05, 0.1/ 

penalties: 0.00001, 1, 5; 



PARAWEI: 25; and 
item pool: 338 items. 

Numerical Results: 

objective function value (lower bound): 865.97; 

objective function value (best solution): 932.33; 
integrality gap: 7.6 %; and 

runtime (seconds): 15,260 (4.2 hours). 

Graphical Results: Figure 10 below. 




1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 



♦ goal 
_»_fbrm 1 
2 



Figure 10: Graphical Results for the AR-Test. 

This figure shows results obtained for the AR-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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AS - Test (Auto and Shop) t 



General Requirements : 
forms : 2 ; 



items; 25 each; and 

taxonomies: 2 (11, 13 items in taxonomy l and 2) . 
Settings : 

CAT- values: 0.05, 0.1, 0.5/ 

penalties: 0.00001, 1, 5; 



PARAWEI: 25; and 
item pool: 196 items. 

Numerical Results: 

objective function value (lower bound): 2,788.00; 

objective function value (best solution): 2,862.73; 
integrality gap: 2.7 %,* and 

runtime (seconds): 215. 

Graphical Results: Figure 11 below. 




—♦—goal 
—a — form 1 
form 2 



Figure 11: Graphical Results for the AS-Test. 

This figure shows results obtained for the AS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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El - Test (Electronics Information) ; 



General Requirements : 
forms ; 2 ; 



items: 20 each; and 

taxonomies: 4 (10,4,2,4 items in taxonomy 1 to 4). 
Settings : 

CAT-values : 0.05, 0.1, 0.7; 

penalties: 0.00001, 1, 10; 

PARAWEI: 3; and 



item pool: 190 items. 

Numerical Results : 

objective function value (lower bound) : 9,489.65; 

objective function value (best solution): 9,561.94; 
integrality gap: 1.0 %; and 

runtime (seconds): 17. 

Graphical Results: Figure 12 below. 




_^goai 
, form 1 
form 2 



Figure 12: Graphical Results for the El-Test. 

This figure shows results obtained for the El-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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GS - Test (General Science) ; 



General Requirements; 
forms : 2 / 



items: 25 each; and 

taxonomies: 12 (3 , 3 , 4 , 2 , 2 , 3 , 1, 2 , 2 , 1, 1, l) . 
Settings : 

CAT-values : 0.05, 0.1, 0.5; 

penalties: 1, 10, 100; 



PARAWEI; 100; and 
item pool; 313 items. 

Numerical Results: 

objective function value (lower bound): 8,095.66; 

objective function value (best solution): 8,433.03; 
integrality gap: 4.2 %; and 

runtime (seconds): 312. 



Graphical Results: Figure 13 below. 




Figure 13: Graphical Results for GS-Test. 

This figure shows results obtained for the GS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MC - Test (Mechanical Comprehension) ; 



General Requirements : 
forms : 2 ; 



items: 25 each; and 

taxonomies: 6 (11,2,2,2,4,4 items in taxonomy 1 to 6). 
Settings : 

CAT-values: 0.01, 0.05, 0.1; 

penalties: 0.00001, 1, 5; 



PARAWEI: 300; and 
item pool: 296 items. 

Numerical Results: 

objective function value (lower bound) : 125.04; 

objective function value (best solution): 1,187.83; 

integrality gap: 850 %; and 

runtime (seconds): 50,000 (13.8 hours). 

Graphical Results: Figure 14 below. 







goal 
form 1 
form 2 
form 3 
form 4 



Figure 14: Graphical Results for the MC-Test. 

This figure shows results obtained for the MC-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MK - Test (Mathematical Knowledge) ! 



General Requirements : 
forms : 4 ; 



items: 25 each; and 

taxonomies: 5 (3, 5, 9, 7,1 items in taxonomy 1 to 5). 
Settings : 

CAT-values: 0.05, 0.1, 0.5; 

penalties: 1, 10, 100; 



PARAWEI: 300; and 
item pool: 327 items. 

Numerical Results: 

objective function value (lower bound): 2,006.71; 

objective function value (best solution): 7,278.24; 
integrality gap: 7.3 %; and 

runtime (seconds): 50,000 (13.8 hours). 

Graphical Results: Figure 15 below. 




♦ goal 




— * — form 


1 


form 


2 


— >^form 


3 


— — form 


4 



10 



Figure 15: Graphical Results for the MK-Test. 

This figure shows results obtained for the MK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form l to 
form 4 are the information curves for each form. 
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WK - Test (Word Knowledge) ; 



General Requirements: 
forms : 2 ; 



items: 25 each; and 

taxonomies: 2 (13,22 items in taxonomy 1 and 2) . 

Settings : 

CAT-values: 0.01, 0.05, 0.1; 

penalties: 0.000001, 1, 5; 



PARAWEI: 500; and 
item pool: 276 items. 

Numerical Results: 

objective function value (lower bound); 3,588.32; 
objective function value (best solution): 5,188.42; 
integrality gap: 39.2 %; and 

runtime (seconds): 13,934 (3.9 hours) 

Graphical Results: Figure 16 below. 




Figure 16: Graphical Results for the WK-Test. 

This figure shows results obtained for the WK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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C. RESULTS OF THE HEURISTIC APPROACH 



The objective function implemented in the heuristic is 
as follows: 

S 2 (PYpf + nypf) 
p f 

This simplification of the objective function previously 
used (i.e., unweighted deviations and no parallel subgoal) 
was chosen for ease of computation. 

The following pages display the objective function 
values per repetition (random restart) of the heuristic as 
well as the graph for the best solution found (Figures 17 to 
30) . 

The heuristic algorithm is implemented on a Pentium 166 
PC, written in Standard Pascal [e.g., Silicon Valley Soft- 
ware 1991] . Table 3 shows the runtimes and the objective 
function values . 



Test 


Objective function 
value 


Repetitions 


Runtime 

(seconds) 


AR 


97.74 


100 


120 


AS 


230.77 


100 


150 


El 


227.10 


100 


120 


GS 


117.13 


100 


130 


MC 


47.80 


100 


250 


MK 


257.94 


100 


280 


WK 


280.68 


100 


160 



Table 3 : Results for tests assembled with the Heuristic Approach. 
As the runtimes show, the heuristic provides results very quickly. 
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AR - Test (Arithmetic Reasoning) 



General Requirements: 
forms : 2 ; 

items: 30 each; and 



taxonomies: 5 (7, 8, 5, 5, 5 items in taxonomy 1 to 5) ; 
Execution Specifics: 



repetitions: 100; and 



ob] 



ective function value: 



97.74. 




i th objfctval 

min 



Figure 17: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value. 




Figure 18: Graphical Results for the AR-Test. 

This figure shows results obtained for the AR-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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AS - Test (Auto and Shop) ; 



General Requirements: 
forms : 2 ; 

items: 25 each; and 

taxonomies: 2 (11, 13 items in taxonomy 1 and 2) . 
Execution Specifics: 

repetitions: 100; and 
objective function value: 230.77. 




Figure 19: Objective function values for each Random Restart. 

The flat line indicates the minimum value of the best solution obtained. 




--♦^goal 
—•—form 1 
— form 2 



Figure 20: Graphical Results for the AS-Test. 

This figure shows results obtained for the AS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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El 



Test (Electronic Information) ; 



General Requirements: 
forms : 2 ; 

items: 20 each; and 

taxonomies: 4 (10,4,2,4 items in taxonomy l to 4) 
Execution Specifics: 

repetitions: 100; and 

objective function value: 227.10. 



I th objfctval 
m In 




Figure 21 : Objective Function Values for each Random Restart. 

The flat line indicates the minimum value of the best solution obtained. 




Figure 22: Graphical Results for the El-Test. 

This figure shows results obtained for the El-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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GS - Test (General Science) ; 



General Requirements : 
forms : 2 ; 

items: 25 each; and 

taxonomies: 12 (3, 3, 4, 2, 2, 3, 1, 2, 2, 1, 1, l) . 
Execution Specifics: 

repetitions: 100; and 
objective function value: 117.18. 




Figure 23: Objective Function Values for each Random Restart. 



The flat line indicates the minimum value of the best solution obtained. 




_4_goal 
form 1 
. form 2 



Figure 24: Graphical Results for the GS-Test. 

This figure shows results obtained for the GS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MC - Test (Mechanical Comprehension) ; 



General Requirements; 
forms : 4 ; 

items: 25 each; and 

taxonomies: 6 (11,2,2,2,4,4 items in taxonomy 1 to 6). 
Execution Specifics: 

repetitions: 100; and 

objective function value: 47.8. 




re pe titlons 



Figure 25: Objective Function Values for each Random Restart. 

The flat line indicates the minimum value of the best solution obtained. 




Figure 26: Graphical Results for the MC-Test. 

This figure shows results obtained for the MC-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 to 
form 4 are the information curves for each form. 
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MK - Test (Mathematical Knowledge) ; 



General Requirements: 
forms: 4/ 

items: 25 each; and 

taxonomies: 5 (3,5,9, 7,1 items in taxonomy l to 5). 
Execution Specifics: 

repetitions: 100; and 
objective function value: 257.94. 




Figure 27: Objective Function Values for each Random Restart. 

The flat line indicates the minimum value of the best solution obtained. 




Figure 28: Graphical Results for the MK-Test. 

This figure shows results obtained for the MK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 to 
form 4 are the information curves for each form. 
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WK - Test (Word Knowledge) ; 



General Requirements: 
forms : 2 ; 

items: 35 each; and 

taxonomies: 2 (13,22 items in taxonomy 1 and 2). 
Execution Specifics: 

repetitions: 100; and 
objective function value: 280.68. 




-rr-r i th objfctval 
min 



Figure 29: Objective Function Values for each Random Restart. 

The flat line indicates the minimum value of the best solution obtained. 




1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 
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—A— form 2 



Figure 30: Graphical Results for the WK-Test. 

This figure shows results obtained for the WK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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D. DISCUSSION OF THE RESULTS 



The optimization approach yields good results for the 
form assembly. The assembled forms for five out of the seven 
tests have information curves (form curves) that are very 
close to the goal curve and parallel to each other. In the 
El- and WK-Test the form cuirves do not reach the goal curve 
in the lower half of the percentile range. Improving these 
fo3rms by changing the weight of the parallel subgoal for the 
El- and WK-Test to zero does not improve the shape of the 
form curves . Increasing the weight for the subgoal yields 
marginally more parallel forms, but increases the overall 
distance to the goal curve much more. Changing the bounds 
for the deviation variables has little effect. Discussions 
with DMDC indicate the item pools for the El- and WK-Test 
are known to be "weak" since in their opinion, too many 
items were extracted for Computer Adaptive Testing. (See 
Wainer [1990] for a description of this relatively new 
method of testing.) They are working to restock these item 
pools . 

The heuristic yields good results for the AR-, AS- and 
MC-Test. Results for GS-Test are not very parallel in the 
higher percentile range and results for the MK-Test are not 
very parallel in the lower percentile range. The form curves 
of the El- and WK-Tests indicate the same deficiency in the 
item pool in the lower half of the percentile range as 
mentioned above. The number of repetitions has been in- 
creased to 1,000 in the AS- and El-Test in order to see, 
whether the heuristic results can be improved. The objective 
function value decreased from 230 to 225 in the AS-Test and 
only from 227 to 226 in the El-Test. 
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V. USING BOTH OPTIMIZATION AND HEURISTIC APPROACHES 



A. USING THE HEURISTIC SOLUTION AS A BOUND 

Table 4 summarizes a direct comparison of the objective 
function values where the heuristic solutions are converted 
to the objective function of the optimization model. 



Test 


Optimization Objective 
function value 


Heuristic Objective 
function value 


AR 


932.33 


1,840.57 


AS 


2,862.73 


4,938.76 


El 


9,561.94 


10,676 . 62 


GS 


8,433.03 


11,830.96 


MC 


1,187.11 


3,377.86 


MK 


7,278.24 


72,095.93 


WK 


5,188.42 


17,883.56 



Table 4 : Comparison of the Results . 

This table provides the objective function values for both the best 
heuristic solution and the best solution obtained solving the opti- 
mization model using the optimization model's objective function. 

The optimization approach yields smaller objective function 
values than the heuristic as would be expected when using 
the optimization model's objective function as an eva- 
luation. However, it is surprising that the differences are 
so great when the graphical results look similar. For the 
AR-Test, the heuristic approach in the percentile range 20 
to 50 is not as parallel as in the optimization solution and 
this difference is responsible for nearly doubling the 
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objective function value. This is similar in the AS-Test, 
where form 2 is constantly below form 1. For the.MK- and WK- 
Tests the corresponding objective function value is 9.9 and 
3.5 times higher than the optimal result. The heuristic's 
solution having a higher objective function value for the 
MK-Test (Figure 15 and Figure 28) is caused by the parallel 
gap between form one and the three other forms in the lower 
percentiles combined with a high weight for the parallel 
subgoal. In the WK-Test the alternating behavior of the 
foirms around each other in Figure 16 is similar to the 
heuristic solution (Figure 30) . However, there is an obvious 
dominance of form one to form two in the lower percentile 
range. The heuristic solution for the MC-Test has a higher 
value than that of the optimization solution, however, the 
graphical result of the heuristic looks much better than the 
optimization. This is most likely due to the cancellation 
effect of positive and negative distances in Figure 14 . 

Using the heuristic solution as an upper bound for the 
objective function value when solving it using GAMS and OSL 
yields better results in almost all cases as shown in Table 
5. Table 5 shows the MC-Test is an exception since the best 
solution with the heuristic bound is worse than without it. 
While this may happen due to OSL's branching choice within 
its branch and bound enumeration, having a bound should help 
in almost all cases . 
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Test 


Objf ctvalue 
(unbounded) 


Objf Ctvalue 
(bounded) 


Change of inte- 
grality gap (%) 


Change of 
runtime (sec) 


AR 


932.33 


905.13 


- 3.1 


- 7,690 


AS 


2,862.73 


2,809.90 


- 1.9 


+ 642 


El 


9,561.94 


9,542.28 


- 0.4 


+ 110 


GS 


8,433.03 


8,443.81 


0.0 


+ 337 


MC 


1,187.11 


2,605.13 


+1130.0 


0 


MK 


7,278 .24 


6,532.59 


-30.0 


-47,567 


WK 


5,188.42 


5,127.66 


- 1.6 


- 1,730 



Table 5: Results of the Optimization Starting with the Best Heuristic 
Solution. 

This table shows a comparison of the results for the optimization ap- 
proach, when the heuristic solution bounds the objective function. A 
negative number indicates an improvement in time or in the integrality 
gap. 
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B. CONCLUSIONS 



This thesis demonstrates how using a linear mixed 
integer goal program can support DMDC's form assembly- 
process. The developed heuristic is a good supplement that 
can be used with the optimization approach described. In 
some cases the heuristic solution yields good upper bounds 
for the optimization that can decrease the computation time. 

C. RECOMMENDATIONS 

The optimization model should be extended to capture 
the other three ASVAB-Tests. 

This heuristic algorithm should be considered a pro- 
totype. Experiments should be conducted with the objective 
function to find the most useful expression. While changing 
the objective fiinction to match that currently implemented 
in the optimization model would be a natural first step, 
experimentation should be more expansive. The heuristic can 
easily accomodate a nonlinear objective function (an option 
not available in integer linear programming) . 

Further research can also be conducted to implement a 
heuristic for Computer Adaptive Testing. 
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